The first thing that I will do is getting the data and all the necessary packages that I need from R to analyze it. And look at what I have.
## 'data.frame': 601229 obs. of 38 variables:
## $ MeetID : int 0 0 0 0 0 0 0 0 0 0 ...
## $ LifterID : int 1 2 2 2 3 4 5 5 6 6 ...
## $ Name : chr "Angie Belk Terry" "Dawn Bogart" "Dawn Bogart" "Dawn Bogart" ...
## $ Sex : chr "F" "F" "F" "F" ...
## $ Event : chr "SBD" "SBD" "SBD" "B" ...
## $ Equipment : chr "Wraps" "Single-ply" "Single-ply" "Raw" ...
## $ Age : int 47 42 42 42 18 28 60 60 52 52 ...
## $ Division : chr "Mst 45-49" "Mst 40-44" "Open Senior" "Open Senior" ...
## $ BodyweightKg : int 60 59 59 59 64 62 67 67 66 66 ...
## $ WeightClassKg : chr "60" "60" "60" "60" ...
## $ Squat1Kg : num 38.6 120.2 120.2 NA NA ...
## $ Squat2Kg : num 47.6 136.1 136.1 NA NA ...
## $ Squat3Kg : num -54.4 142.9 142.9 NA NA ...
## $ Squat4Kg : num NA NA NA NA NA ...
## $ BestSquatKg : num 47.6 142.9 142.9 NA NA ...
## $ Bench1Kg : num 15.9 88.5 88.5 88.5 29.5 ...
## $ Bench2Kg : num 20.4 95.2 95.2 95.2 31.8 ...
## $ Bench3Kg : num -24.9 -97.5 -97.5 -97.5 -34 ...
## $ Bench4Kg : num NA NA NA NA NA NA NA NA NA NA ...
## $ BestBenchKg : num 20.4 95.2 95.2 95.2 31.8 ...
## $ Deadlift1Kg : num 61.2 136.1 136.1 NA 90.7 ...
## $ Deadlift2Kg : num 70.3 149.7 149.7 NA -97.5 ...
## $ Deadlift3Kg : num -77.1 163.3 163.3 NA NA ...
## $ Deadlift4Kg : num NA NA NA NA NA NA NA NA NA NA ...
## $ BestDeadliftKg: num 70.3 163.3 163.3 NA 90.7 ...
## $ TotalKg : num 138.3 401.4 401.4 95.2 122.5 ...
## $ Place : chr "1" "1" "1" "1" ...
## $ Wilks : num 155 456 456 108 130 ...
## $ McCulloch : num 168 466 466 110 138 ...
## $ MeetPath : chr "365strong/1601" "365strong/1601" "365strong/1601" "365strong/1601" ...
## $ Federation : chr "365Strong" "365Strong" "365Strong" "365Strong" ...
## $ Date : chr "2016-10-29" "2016-10-29" "2016-10-29" "2016-10-29" ...
## $ MeetCountry : chr "USA" "USA" "USA" "USA" ...
## $ MeetState : chr "NC" "NC" "NC" "NC" ...
## $ MeetTown : chr "Charlotte" "Charlotte" "Charlotte" "Charlotte" ...
## $ MeetName : chr "Junior & Senior National Powerlifting Championships" "Junior & Senior National Powerlifting Championships" "Junior & Senior National Powerlifting Championships" "Junior & Senior National Powerlifting Championships" ...
## $ Group : chr "45-49" "40-44" "40-44" "40-44" ...
## $ GroupKG : chr "56-65" "56-65" "56-65" "56-65" ...
Obviously this is a fantastic and well kept dataset! Great job openpowerlifting.com!
I want to draw your attention that there are so many columns in this dataset, each containing unique information about each lifter. This will give us a lot of power to draw some observations about patterns of skills and potentials of the lifters. First I want to start by breaking them up into age ranges. I don’t like the ones drawn by more federations for my purposes. I want to have a bit higher resolution, so I will break them up into 5 year age gaps and see how things go from there.
I am also breaking them up into men and women for physiological reasons. I will be asking questions about max lift strength, so I don’t want to obscure the findings for either of those groups.
Let’s start by plotting up the total lifts of men and women versus their age as a first pass.
There’s an obvious peak around age 25, with a slowdown afterwards. What does that mean? We have a ton of data and variability, let’s take a closer look. I’m plotting a distribution of data in different ages to see how representative we are and what the resolution is
What does this mean?
What you see on the x axis are the age groups that I added to the data earlier. The shape of this violin plot is basically the distribution of lifters and the tails up and down are the range of total lifts for that age group. The higher the plot, the stronger the lifters. The higher the horizontal line across the violins, the higher the median of that group is.
What we see here is that there seems to be a pretty consistent relationship, showing both men and women peak overall in their 20’s…How true is that though?
I ran a linear model, which is basically a fancy way to statistically look at a relationship between variables. In our case I want to know how much do Age and Bodyweight affect one’s total.
##
## Call:
## lm(formula = TotalKg ~ Age + BodyweightKg + Equipment, data = men)
##
## Residuals:
## Min 1Q Median 3Q Max
## -752.39 -68.92 3.04 72.80 512.23
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 390.40274 3.05343 127.86 <2e-16 ***
## Age -1.19420 0.02499 -47.78 <2e-16 ***
## BodyweightKg 3.98592 0.01480 269.35 <2e-16 ***
## EquipmentRaw -170.38049 2.60886 -65.31 <2e-16 ***
## EquipmentSingle-ply -77.40003 2.61819 -29.56 <2e-16 ***
## EquipmentWraps -133.16085 2.67832 -49.72 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 111.6 on 133743 degrees of freedom
## (885 observations deleted due to missingness)
## Multiple R-squared: 0.4118, Adjusted R-squared: 0.4118
## F-statistic: 1.873e+04 on 5 and 133743 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = TotalKg ~ Age + BodyweightKg + Equipment, data = women)
##
## Residuals:
## Min 1Q Median 3Q Max
## -326.76 -46.84 -2.48 44.63 344.43
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 305.10687 4.21639 72.36 <2e-16 ***
## Age -0.41827 0.02673 -15.64 <2e-16 ***
## BodyweightKg 2.12087 0.01844 115.03 <2e-16 ***
## EquipmentRaw -134.68305 3.93889 -34.19 <2e-16 ***
## EquipmentSingle-ply -44.54240 3.95998 -11.25 <2e-16 ***
## EquipmentWraps -117.08138 4.01126 -29.19 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 72.4 on 56733 degrees of freedom
## (216 observations deleted due to missingness)
## Multiple R-squared: 0.3358, Adjusted R-squared: 0.3358
## F-statistic: 5737 on 5 and 56733 DF, p-value: < 2.2e-16
The important things from what you see above are in two columns: 1) Under coefficients: “Estimate” and “Pr(>|t|)”. The estimate column can be read as such: “When you increase the parameter (in this case Age or Bodyweight) by one unit, your TotalKg changes by this much.” This is cool! It’s way cool because even though the relationship between Age and Total weight lifted is significantly negative (the Pr column has a very small number), in both men and women, this relationship is not very strong. On average men lose about 1.2kg of their total per year of aging, while women lose about .5.
The much stronger relationship is between bodyweight ant Total. Men gain almost 4kg for each 1kg in bodyweight and for women that relationship is more like 2:1.
So maybe age is not that huge of a predictor of your lifting potential. Let’s dive in.
I’m going to cull the data to remove people under 24. In the time before 25, the total actually increases with age and it creates a weird hump that may be obscuring some patterns.
I’ll check what the patterns are for younger people later on!
Now the negative relationships with age are starting to become more and more obvious. Let’s take a look with our fancy linear model again.
##
## Call:
## lm(formula = TotalKg ~ Age + BodyweightKg + Equipment, data = men2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -757.61 -65.80 1.75 67.91 468.77
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 572.95247 3.56795 160.58 <2e-16 ***
## Age -4.28207 0.03359 -127.48 <2e-16 ***
## BodyweightKg 3.48279 0.01817 191.72 <2e-16 ***
## EquipmentRaw -180.19525 2.72644 -66.09 <2e-16 ***
## EquipmentSingle-ply -67.47764 2.73648 -24.66 <2e-16 ***
## EquipmentWraps -141.44523 2.81157 -50.31 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 104.4 on 75922 degrees of freedom
## (606 observations deleted due to missingness)
## Multiple R-squared: 0.4802, Adjusted R-squared: 0.4802
## F-statistic: 1.403e+04 on 5 and 75922 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = TotalKg ~ Age + BodyweightKg + Equipment, data = women2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -301.14 -46.28 -3.64 41.78 328.78
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 409.88862 4.79269 85.52 <2e-16 ***
## Age -2.21699 0.03891 -56.98 <2e-16 ***
## BodyweightKg 1.85859 0.02277 81.63 <2e-16 ***
## EquipmentRaw -151.43106 4.25818 -35.56 <2e-16 ***
## EquipmentSingle-ply -47.64090 4.29682 -11.09 <2e-16 ***
## EquipmentWraps -134.20279 4.34645 -30.88 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 70.47 on 33965 degrees of freedom
## (153 observations deleted due to missingness)
## Multiple R-squared: 0.376, Adjusted R-squared: 0.3759
## F-statistic: 4093 on 5 and 33965 DF, p-value: < 2.2e-16
What’s happening here
The relationship between Age and TotalKg is larger now. That’s because we are looking after this original growth period now. Men seem to be losing closer to 4kg per year off their total, while women lose about 1.5kg. Good for women! One interpretation is that as you get older, it’s more difficult to lift heavy weights…duh. Obviously we have a bodyweight disparity here too, where the bigger you are, the more you will lift. An interesting interaction here suggests though, that there is a negative correlation between age and bodyweight, meaning: the older you are, you lose bang for the buck in bodyweight - is this real?
So far we have been looking at this accross all equipment types, let’s break it down and see if these relationships hold among all
## (Intercept) Age BodyweightKg
## 596.328109 -5.410918 3.687664
## (Intercept) Age BodyweightKg
## 406.302588 -3.593935 3.076590
## (Intercept) Age BodyweightKg
## 490.196102 -4.920829 3.928118
## (Intercept) Age BodyweightKg
## 443.402471 -4.223248 3.344580
## (Intercept) Age BodyweightKg
## 329.775536 -1.233394 2.425924
## (Intercept) Age BodyweightKg
## 270.472244 -1.781078 1.463824
## (Intercept) Age BodyweightKg
## 337.283066 -3.328599 2.894045
## (Intercept) Age BodyweightKg
## 268.636496 -1.665245 1.681378
So this is REALLY interesting! Let’s break down these results: What you see here is that different equipment types have a difference in how Age and Bodyweight affect the Totals:
Wraps - large loss of total with Age, decent gain of Total with Bodyweight
Wraps - _very small loss of total with Age, small gain of Total with Bodyweight
Now this is super interesting. Is this a real relationship or do we maybe just have a lack of representation of older or bigger lifters in some leagues?2
When we look at Age in each league, we do see that there are proportionally slightly more older lifters in the Multy-ply and Single-ply categories
There is a strange trend to have lifters of heavier lifters in the Multi-ply and Wraps divisions for men. This could be a difference in the weight categories offered by different leagues that support multi-ply, vs ones that don’t.
In women, we do see smaller lifters in the Single-ply category, but other than that things seem pretty equal in terms of distrubtion. Remember the wider the violin, the more lifters are there.
You can see clearly the categories by the wiggle in distribution, each wiggle is a high proportion of people likely at the top of their weight class, where there are less lifters at the bottom of a weight class.
Something smells fishy here though! (and no it’s not the knee sleeves I haven’t washed in 3 months) + We saw that both Age and Weight have some effect on the weight lifted by people, but do we just see a large proportion of monster heavyweight lifters in the younger categories? + Let me clarify here - big guys lift big weights (generally) -> case and point below (although the relationship is a little weaker in women)
But what is the distribution of weight in different ages?
The answer is sort of!
Look above. What you see is that the mean weights do decrease slightly with age for both men and women…but! The upper end of those violins shows you those heavy lifters we talk about! You clearly see that when we reach 50 or so, there are very few really heavy men and women. Hmmm!
Let’s look at the same type of relationship in total weight lifted. I think when we compare maximum weight lifted in each age category and compare those, the relationship is way different than if we compare the means in each category These two metrics paint two separate pictures and I think bodyweight may be part of the explanation (at least in this dataset).
What you see below is the median (less affected by skew than the mean) total weight lifted in each weight category, as well as the maximum weight lifted for men.
What you see is that the maximum weight lifted in each category declines much more rapidly than the average weight. This does differ between Equipment type with Raw being the smoothest decline and Multi-ply seeming to be the strongest
As example let’s take the difference between average 25-29 category and 50-54
## $`Multi-ply`
## [1] 64.045
##
## $Raw
## [1] 55
##
## $`Single-ply`
## [1] 90
##
## $Wraps
## [1] 85
## $`Multi-ply`
## [1] 317.52
##
## $Raw
## [1] 177.5
##
## $`Single-ply`
## [1] 250
##
## $Wraps
## [1] 207.82
What you see here are the differences between median and then maximum weight lifted at those two age categories. When we look at median, you see <90kg difference in all categories, and it is as small as 55 kg difference in mean total weight lifted in the raw division
When we look at maximums - large differences appear! Raw is still with the smallest difference of 177.5kg, while Multi-ply lifters lift 317.52 kg more as a maximum of the 25-29kg category than the maximum in the 50-55. WHY?
Let’s confirm if there is a similar trend in women.
Absolutely there is!
## $`Multi-ply`
## [1] 0
##
## $Raw
## [1] 37.5
##
## $`Single-ply`
## [1] 90
##
## $Wraps
## [1] 41.25
## $`Multi-ply`
## [1] 165.41
##
## $Raw
## [1] 118.5
##
## $`Single-ply`
## [1] 153
##
## $Wraps
## [1] 225.5
We see that there is NO difference between median Multi-ply lifts between age ranges of 25-29 and 50-54, but over 165Kg difference in maximum lift!!
Let’s resubset our data and see who is lifting those weights and how much do they weigh compared to their max counterparts in the older categories
If that doesn’t show it, what does. The size and redness of the dot is the weight of the lifter who made a maximum lift. As you get to the older categories, what you see is a consistent decline in both maximum lift and in weight of the lifter, who lifted that maximum weight.
Let’s show this a little more explicitly with a relationship between weight lifted and weight of lifter
What you see is a rather strong correlation between weight of the lifter and the weight the lifter lifted. Remember, these are the BEST lifters of their Age range from among 150 000 entries!
Let’s check on this with the women lifter data.
This is not as convincing as the male data, but the overall trend is similar. Remember, in women there is also a much smaller effect of age on total weight lifted too. And a smaller relationship between weight and total weight lifted.
Let’s see the more explicit plot.
This is relatively consistent! A bit more noise than the male data, but a very similar relationship.
Now one last thing that we need to see is if this is consistent within weight classes between ages. Let’s first look at performance of lifters in weightlcasses
Let’s try to plot these up in a meaningful way that will allow us to see progression through the age in the same weight groups
You can start seeing the trends here. The heaviest of people are no longer present in the older age groups. Now let’s see that distribution broken down into age groups over time a little more clearly.
I picked the group with most entries (86-95kg). Above I am showing you the performances of people in the same weight group as they age for men (top) and women (bottom). Well y’all this seems just idealistic for me… There’s a decline in performance with age, but that decline is not really that incredible. The means decline rather slowly and while overall 25 year olds definitely perform better, there are a good number of 50+ year olds that perform above the 25yo average.
What we see in this dataset is that Age does matter in the amount of weight that you lift…but not ras much as you might think! It is dependent on which type of lifting you do (Multi-ply, Raw…). But overall, the avearage lifts for ages don’t really drop by much with time. We are talking in the range of 3kg/year, in 30 years that’s 60kg or 132 lbs. In a hypothetical if you started with 1500 total at 25, at 55 you can still pull a nice 1350.
Let me make this clear - these are trends! This doesn’t say that with a lot of training and good regimen you can’t keep or excede your gains past a certain point! Maybe when you get to your 60s it will get more difficult as these trends show a steep dropoff at that point.. The bigger influence on your lifts are your weight. That being said:
DON’T TRY TO FATTEN UP JUST TO LIFT MORE as you get older. That comes with many health outcomes, so DO NOT TAKE THIS AWAY FROM THIS DATA. What you can take away from here is that if you plan to keep your weight, you don’t have to stop lifting. You’re not really likely to lose your gainz over time by much.
I want to absolutely acknowledge that this is data of likely trained lifters. If you’re competing in your 50s and 60s, you’re likely to be a seasoned powerlifter, so don’t compare your numbers too harshly to everyone here. These folks are good!
Final message that I think is well supported with this data - don’t stop lifting, don’t stop asking questions!
PS: ## Younglings
For shits ang giggles let’s verify why I threw out the data of people under 22. + They create a hump in the distribution of Total weight lifted because they are still getting stronger. If we plot these side by side, you can see that lifters get stronger till about 23-25 and then plateau there.
##
## Call:
## lm(formula = TotalKg ~ Age + BodyweightKg + Equipment, data = men3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -677.87 -61.04 0.61 61.70 449.85
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -26.23804 6.06224 -4.328 1.51e-05 ***
## Age 18.98153 0.14926 127.173 < 2e-16 ***
## BodyweightKg 3.56664 0.02074 171.999 < 2e-16 ***
## EquipmentRaw -141.29413 5.08609 -27.780 < 2e-16 ***
## EquipmentSingle-ply -61.31487 5.10453 -12.012 < 2e-16 ***
## EquipmentWraps -116.37323 5.17015 -22.509 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 95.58 on 57815 degrees of freedom
## (279 observations deleted due to missingness)
## Multiple R-squared: 0.5324, Adjusted R-squared: 0.5323
## F-statistic: 1.316e+04 on 5 and 57815 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = TotalKg ~ Age + BodyweightKg + Equipment, data = women3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -314.42 -40.58 -1.59 39.55 318.05
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 48.50689 8.71955 5.563 2.68e-08 ***
## Age 8.69720 0.14493 60.012 < 2e-16 ***
## BodyweightKg 2.37514 0.02706 87.772 < 2e-16 ***
## EquipmentRaw -86.42143 8.09560 -10.675 < 2e-16 ***
## EquipmentSingle-ply 0.67369 8.10626 0.083 0.934
## EquipmentWraps -76.46058 8.19974 -9.325 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 64.6 on 22762 degrees of freedom
## (63 observations deleted due to missingness)
## Multiple R-squared: 0.4606, Adjusted R-squared: 0.4605
## F-statistic: 3887 on 5 and 22762 DF, p-value: < 2.2e-16